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jrt , We study nonparametric estimation of the diffusion coefficient 

from discrete data, when the observations are blurred by additional 
noise. Such issues have been developed over the last 10 years in sev- 
eral application fields and in particular in high frequency financial 
data modelling, however mainly from a parametric and semipara- 
metric point of view. This paper addresses the nonparametric esti- 
^vq I mation of the path of the (possibly stochastic) diffusion coefficient 

V^ i in a relatively general setting. 

^^ ' By developing pre-averaging techniques combined with wavelet 

f->^ i thresholding, we construct adaptive estimators that achieve a nearly 

^^ ' optimal rate within a large scale of smoothness constraints of Besov 

^^ . type. Since the diffusion coefficient is usually genuinely random, 

we propose a new criterion to assess the quality of estimation; we 
retrieve the usual minimax theory when this approach is restricted 
to a deterministic diffusion coefficient. In particular, we take advan- 
tage of recent results of Reif5 [33] of asymptotic equivalence between 
C^ ' a Gaussian diffusion with additive noise and Gaussian white noise 

model, in order to prove a sharp lower bound. 
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1 Introduction 

We are interested in the following statistical setting: we assume that we 
have real-valued data of the form 

^3,n = ^jA^ + (^j,n, J = 0, 1, . . . , n (1.1) 

where A„ > is a sampling time, (sj^n) is an additive noise process^ and 
the continuous time process X = {Xt)t>o has representation 

Xt = Xo+ [ b,ds+ f a, dWs, (1.2) 

Jo Jo 

In other words, X is an Ito continuous semimartingale driven by a Brow- 
nian motion W = {Wt)t>o with drift b = (6t) and diffusion coefficient or 
volatility process a = {at). This is the so-called additive microstructure 
noise model. We assume that the data (Zj^n) are sampled in a high- 
frequency framework: the time step A„ between observations goes to 0, 
but nAn remains bounded as n — )• oo, i.e. the whole statistical experiment 
is taken over a fixed time interval. 

In this asymptotic framework, the only parameter that can be consis- 
tently estimated is the unobserved path of the diffusion coefficient t -^ <t^ , 
and unless specified otherwise, it is random. Whereas nonparametric esti- 
mation of the diffusion coefficient from direct observation Xjj\^ is a fairly 
well known topic when o"^ is deterministic ([18], [24] and the review paper 
of Fan [16]), nonparametric estimation in the presence of the noise (e^.^) 
substantially increases the difficulty of the statistical problem. This is the 
topic of the present paper, and it can be related to practical issues in 
several application fields. In finance for instance, by considering the Zj^n 
as the result of a latent or unobservable efficient price -^iA„ corrupted by 
microstructure effects €j^n at scale A„, we obtain a more realistic model ac- 
counting for stylised facts on intraday scale usually attributed to bid-ask 
spread manipulation by market makers^. Considering a diffusion pertur- 
bated by noise applies in other fields as well: in the context of functional 
MRI or fRMI, the problem of inference for diffusion processes with error 
measurement has been addressed by Donnet and Samson [13, 12] in an er- 
godic and parametric setting, when the sampling time A„ does not shrink 



'^ implicitly assumed to be centered for obvious identifiability purposes. 

^Tliis approach was grounded on empirical findings in the financial econometrics 
literature of the early years 2000 (among many others Ait-Sahalia et al. [1] , Mykland 
and Zhang [31] and the references therein) 



to as n — )• oo. Se also Favetto and Samson [17]. Recently, Schmisser [36] 
has systematically studied the nonparametric estimation of the drift and 
the diffusion coefficient in an ergodic and mixed asymptotic setting, when 
A„ — )• but nA„ — >• cxD. In this paper, we consider the nonergodic case, 
when only the diffusion coefficient can be identified, with A„ — )■ and 
nA„ fixed. 

1.1 Estimating the diffusion coefficient under additive 
noise: some history 

Estimation of a finite-dimensional parameter and nonparamet- 
ric functionals 

The first results about statistical inference of a diffusion with error mea- 
surement go back to Gloter and Jacod [20, 21] in 2001. They showed that 
if cj( = a{t, "d) is a deterministic function known up to a 1-dimensional pa- 
rameter t?, and if moreover the Sj^n are Gaussian and independent, then 
the LAN condition holds (Local Asymptotic Normality) for A„ = n~^ 
with rate n~^'^. This implies that, even in the simplest Gaussian dif- 
fusion case, there is a substantial loss of information compared to the 
case without noise, where the standard n~^'^ accuracy of estimation is 
achievable. 

At about the same time, the microstructure noise model for financial 
data was introduced by Ait-Sahalia, Mykland and Zhang in a series of 
papers [1, 39, 38]. Analogous approaches in various similar contexts pro- 
gressively emerged in the financial econometrics literature: Podolskij and 
Vetter [32], Bandi and Russell [4, 3], Barndorff-Nielsen et al. [5] and the 
references therein. These studies tackled estimation problems in a sound 
mathematical framework, and incrementally gained in generality and el- 
egance. A paradigmatic problem in this context is the estimation of the 
integrated volatility /g u^ds. Convergent estimators were first obtained 
by Ait-Sahalia et al. [1] with a suboptimal rate n~^'^ . Then the two-scale 
approach of Zhang [38] achieved the rate n~^'*. The Gloter- Jacod LAN 
property of [20] for deterministic submodels shows that this cannot be 
improved. Further generalizations took the way of extending the nature 
of the latent price model X (for instance [2, 37, 11]) and the nature of 
the microstructure noise (cj^n)- R took some more time and contribu- 
tions before Jacod and collaborators [26] took over the topic in 2007 with 
their simple and powerful pre-averaging technique, introduced earlier in 



a simplified context by Podolskij and Vetter [32] . In essence, it consists in 
first, smoothing the data as in signal denoising and then, apply a standard 
realised volatility estimator up to appropriate bias correction. Stable con- 
vergence in law is displayed for a wide class of pre-averaged estimators 
in a fairly general setting, closing somehow the issue of estimating the 
integrated volatility in a semiparametric setting. 

Nonparametric inference 

In the nonparametric case, the problem is a little unclear. By nonpara- 
metric, one thinks of estimating the whole path t -^ a^. However, since 
a"^ = {a^)t>o is usually itself genuinely random, there is no "true pa- 
rameter" to be estimated! When the diffusion coefficient is deterministic, 
the usual setting of statistical experiments is recovered. In that latter 
case, under the restriction that the microstructure noise process consists 
of i.i.d. noises, Munk and Schmidt-Hieber [30, 29] proposed a Fourier 
estimator and showed its minimax rate optimality, extending a previous 
approach for the parametric setting ( [7] ) . This approach relies on a formal 
analogy with inverse ill-posed problems. When the microstructure noises 
(ej^n) are Gaussian i.i.d. with variance r^, Reifi [33] recently showed the 
asymptotic equivalence in the Le Cam sense with the observation of the 
random measure 

V2^ + T7i-^/'^B 



where S is a Gaussian white noise. This is a beautiful and deep result: the 
normalisation n~^'^ is illuminating when compared with the optimality 
results obtained by previous authors. 

1.2 Our results 

The asymptotic equivalence proved in [33] provides us with a benchmark 
for the complexity of the statistical problem and is inspiring: we target in 
this paper to put the problem of estimating nonparametrically the ran- 
dom parameter t -^ cr^ to the level of classical denoising in the adaptive 
minimax theory. In spirit, we follow the classical route of nonlinear esti- 
mation in de-noising, but we need to introduce new tools. Our procedure 
is twofold: 

1. We approximate the random signal t ~-^ o"^ by an atomic represen- 



tation 

^?« Yl {^^i^<^)Mt) (1.3) 

i/GV(a2) 

where (•,•) denotes the usual L^-inner product and [ipu,i^ £ ^i^)) 
is a collection of wavelet functions that are localised in time and 
frequency, indexed by the set V(o"^) that depends on the path t -^ a^ 
itself. As for the precise meaning of the symbol ~ and the property 
of the il^uS, we do not specify yet. 

2. We then estimate (o"^, "0;/) and specify a selection rule for V(cr) (with 
the dependence in a somehow replaced by an estimator). The rule 
is dictated by hard thresholding over the estimations of the coeffi- 
cients ((T^,'i/'i/) that are kept only if they exceed some noise level, 
tuned with the data, as in standard wavelet nonlinear approxima- 
tion (Donoho, Johnstone, Kerkyacharian, Picard and collaborators 
[14, 15, 23]). 

The key issue is therefore the estimation of the linear functionals 

(ct2,V'.)= I MiVtdt- (1-4) 



An important fact is that the functions ipy are well located but oscillate, 
making the approximation of (1.4) delicate, in contrast to the global 
estimation of the integrated volatility: this is where we depart from the 
results of Jacod and collaborators [26, 32]. If we could observe the latent 
process X itself at times jA„, then standard quadratic variation based 
estimators like 

^ V.(iA„)(X,-A„ - X(^-_i)aJ' (1-5) 

j 

would give rate-optimal estimators of (1.4), as follows from standard 
results on nonparametric estimation in diffusion processes [18, 24, 25]. 
However, we only have a noisy version of X via (Zj^n) and further "inter- 
mediate" de-noising is required. 

At this stage, we consider local averages of the data Zj^n at an inter- 
mediate scale m so that A„ <C 1/m but tti — ;• oo. Let us denote loosely 
(and temporarily) by Ave{Z)i^m an averaging of the data (-^j,n) around 
the point i/m. We have 

Ave{Z)i^m ~ Xi/^ + small noise (1.6) 



and thus we have a de-blurred version of X, except that we must now 
handle the small noise term of (1.6) and the loss of information due to 
the fact that we dispose of (approximate) X^j^ on a coarser scale since 
m <C A~^. We subsequently estimate (1.4) replacing the naive guess (1.5) 
by 

y^ ipu{i^n) [(Ave(Z)j_m - Ave(Z)j_i^m) + bias correction] (1.7) 



up to a further bias correction term that comes from the fact that we 
take square approximation of X via (1.6). In Section 3.1, we generalise 
(1.7) to arbitrary kernels within a certain class of oscillating pre-averaging 
functions, in the same spirit as in Gloter and Hoffmann [19] or Rosenbaum 
[34] where this technique is used for denoising stochastic volatility models 
corrupted by noise. 

We prove in Theorems 3.4 and 2.9 an upper bound for our proce- 
dure in L^-loss error over a fixed time horizon. Assuming that the path 
t -^ af has s derivatives in L'" with a prescribed probability, the upper 
bound is of the form n~"''^ for an explicit a = a{s,p,TT) < 1 to within 
inessential logarithmic terms. We retrieve the expected results of wavelet 
thresholding over Besov spaces up to the noise rate n~^'^ instead of the 
usual n~^'^ in white Gaussian noise or density estimation, but that is 
inherent to the problem of microstructure noise, as already established in 
[20]. It is noteworthy that, although the rates of convergence depend on 
the smoothness parameters (s,7r), the thresholding procedure does not, 
and is therefore adaptive in that sense. A major difficulty is that in or- 
der to employ the wavelet theory in this context, we must assess precise 
deviation bounds for quantities of the form (1.7), which require delicate 
martingale techniques. We prove in Theorem 2.12 that this result is sharp, 
even if t -w af is random so that we do not have a statistical model in 
the strict sense. In order to encompass this level of generality, we propose 
a modification of the notion of upper and lower rate of estimation of a 
random parameter in Definition 2.3 and 2.6. This approach is presented 
in details in the methodology Section 2.2. 

The paper is organized as follows. In Section 2 we introduce notation 
and formulate the key results. An explicit construction of the estimator 
can be found in Section 3. Finally, the proofs of the main results and 
some (unavoidable) technicalities are deferred to Section 4. 



2 Main results 

2.1 The data generating model 

We consider a continuous adapted l-dimensional process X of the form 
(1.2) on a filtered probability space {Q,T,{Tt)t>Q,^)- Without loss of 
generality, we assume that Xq = 0. 

Assumption 2.1. The processes a and b are cadlag (right continuous 
with left limits), Ft-adapted, and a weak solution of (1.2) is unique and 
well defined. 

Moreover, a weak solution to Yf = L asdWg is also unique and well 
defined, the laws of X and Y are equivalent on Tt and we have, for some 
p>l 

E[exp(p / ^dY,)] <oo. (2.1) 

Jo ^s 

We consider a fixed time horizon T = nA„, and with no loss of gen- 
erality, we take T = 1 hence A„ = n~^. For j = 0, . . . , n, we assume that 
we can observe a blurred version of X at times A„j = j/n over the time 
horizon [0,r] = [0,1]. The blurring accounts for microstructure noise at 
fine scales and takes the form 

Zj,n ■■= Xj/n + €j^n, j = 0, 1, . . . , n (2.2) 

where the microstructure noise process (ej,n) is implicitly defined on the 
same probability space as X and satisfies 



Assumption 2.2. We have 



ei 



J,n 



a{j/n,Xj/n)7]j,n, (2.3) 



where the function (t, x) -^ a{t, x) is continuous and bounded. Moreover, 
the random variables {rjj^n) o'^e independent, and independent of X. More- 
over, for every < j < n and n > 1, we have 

E [7/j>] = 0, E [r?2„] = 1, E [|7?j-„|P] < oo, p > 0. 

Given data Z, = {Zj^n, j = 0, . . . ,n} following (1.1), the goal is to 
estimate non-parametrically the random function t -^ aj over the time 
interval [0, 1]. Asymptotics are taken as the observation frequency n — t- oo. 



Discussion on Assumptions 2.1 and 2.2 

Assumption 2.1 on b and a is relatively weak, except for the moment 
condition (2.1). This assumption is somewhat technical, for it enables to 
implicitly assume that 6 = 0. Indeed, if Po-,fe denotes the law of {Xt)te[o,i] 
with drift b and volatility a, we have by Girsanov's theorem 



exp ( / ^dXs - - I -^ds 



By Holder inequality, for a random variable Z, we derive 



\Z\ 



i/p 



<E.,o[exp(pjJ| ^dA,)]^/(^^)E.,o[|Zp]^/^ (2.4) 

with p = pp/{p — 1). Therefore, Condition (2.1) guarantees that if we 
have an estimate of the form Eo-,o [l-^^P] ^ Cp n~'^ for any p > 1 and for 
some 7 > 0, then the same property holds replacing Pct,o by fa,bi up to a 
modification of the constant Cp. Thus Condition (2.1) is a useful tool that 
enables to condense the proofs in many places afterwards. It is satisfied as 
soon as a is bounded below and 6 has appropriate integrability conditions. 
In some cases of interest where it may fail to hold, one can still proceed 
by working directly under Po-^ft. 

Concerning Assumption 2.2, we assume a relatively weak scheme of 
microstructure noise, by assuming that the €j^n form a martingale array 
that may depend on the unobserved process X through a function t -^ 
a{t, Xt) as the standard deviation of the additive noise. This enables richer 
structures than simple additive independent noise. One may wish to relax 
further Assumption 2.2 by assuming a correlation decay only, but again, 
for technical reason, we keep to this simpler framework. 



2.2 Statistical methodology 

Recovering a"^ over a function class V 

Strictly speaking, since the target parameter a'^ = {(Tt)t&[o,i] is random 
itself (as an J^-adapted process), we cannot assess the performance of an 
"estimator of c^" in the usual way. We need to modify the usual notion 
of convergence rate over a function class. 



Definition 2.3. An estimator of a'^ = ((Tf)tg[o,i] is a random function 

t^alit), tG[0,l], 
measurable with respect to the observation (Zj^n) defined in (1.1). 

We need to modify the usual notion of convergence rate. Let us denote 
by P a class of real- valued functions defined on [0, 1]. 

Definition 2.4. We say that the rate < Wn — ?• (as n — )■ ooj is achiev- 
able for estimating o"^ in L^-norm over V if there exists an estimator a^ 
such that 

limsupw-^E ||?n-o-^llLp([o,i])II; o^„-i <oo. (2.5) 

Remark 2.5. If we wish (at) to be deterministic, we can make a priori 
assumptions so that the condition a"^ ^ D is satisfied, in which case we 
simply ignore the indicator in (2.5). In other cases, this condition will be 
satisfied with some probability (see below). But it may also well happen 
that for some choices of V we have P [c^ G P] = in which case the 
upper bound (2.5) becomes trivial and noninformative. 

In this context, a sound notion of optimality is unclear. We propose 
the following 

Definition 2.6. The rate Vn is a lower rate of convergence over D in L^ 
norm if there exists a filtered probability space (^2, J-", (J^t)t>o,IP), a process 
X defined on {Vt,T) with the same distribution as X under Assumptions 
2.1 together with a process (cj-.n) satisfying (2.3) with X in place of X, 
such that Assumption 2.2 holds, and moreover: 





F[a^ eV]>0 




(2.6) 


and 

liminf t!~"^ inf E 


_Pn-^^llLP{[0,l])I|^2g^| 


>o, 


(2.7) 


where the infimum is taken c 


wer all estimators. 







Let us elaborate on Definition 2.6: as already mentioned, o"^ is "gen- 
uinely" random, and we cannot say that our data {Zj^n} generate a sta- 
tistical experiment as a family of probability measures indexed by some 
parameter of interest. Rather, we have a fixed probability measure P, but 



this measure is only "loosely" specified by very weak conditions, namely 
Assumptions 2.1 and 2.2. A lower bound as in Definition 2.6 says that, 
given a model P, there exists a probability measure P, possibly defined on 
another space so that Assumptions 2.1 and 2.2 hold under P together with 
(2.7). Without further specification on our model, there is no sensible way 
to discriminate between P and P since both measures (and the accompa- 
nying processes) satisfy Assumptions 2.1 and 2.2; moreover, under P, we 
have a lower bound. 

Function classes: wavelets and Besov spaces 

We describe the smoothness of a function by means of Besov spaces on the 
interval. A thorough account of Besov spaces B^ ^ and their connection 
to wavelet bases in a statistical setting are discussed in details in the 
classical papers of Donoho et al. [15] and Kerkyacharian and Picard [28]. 
Let us recall some fairly classical'^ material about Besov spaces through 
their characterisation in terms of wavelets. We use no-regular wavelet 
bases {ipu)i> adapted to the domain [0, 1]. More precisely, the multi-index 
V concatenates the spatial index and the resolution level j = \v\. We set 
Aj := {v, \v\ = j} and A := Uj>_iAj. Thus for / G -L^([0, 1]), we have 

where we have set j := —1 in order to incorporate the low frequency part 
of the decomposition. From now on the basis {ipu)u is fixed and depends 
on a regularity index no which role is specified in Assumption 2.8 below. 

Definition 2.7. For s > and vr G (0, oo], a function / : [0, 1] — )■ M 
belongs to the Besov space B^ oo([0i M) ^f ^^^ following norm is finite: 

ll/lles^([o,i]) ■■= s^iP 2^-(^+^-i)( j; \{f,i^.)rf\ (2.8) 

with the usual modification if n = oo. 

Precise connection between this definition of Besov norm and more 
standard ones can be found in [9, 10]. Given a basis (^i/);/ with regularity 
index no > 0, the Besov space defined by (2.8) exactly matches the usual 



We follow closely the notation of Cohen [9] . 
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definition in terms of modulus of smoothness for /, provided that vr ^ 
1 and s < hq. A particular case include the Holder space C*([0, 1]) = 
B^ oo([0i !])• Moreover, the following Sobolev embedding inequality holds 

ll-^lle^ioodO,!]) ^ II/IIb;i,^([0,1]) for ^1 - IM =S2- l/vr2, Vr2 ^ TTi, 

showing in particular that B^ ooii^^ M) i^ embedded into continuous func- 
tions as soon as s > l/vr. The additional properties of the wavelet basis 
{ipu)u that we need are summarized in the next assumption. 

Assumption 2.8 (Properties of the basis {iIju)u)- For vr > 1 : 

• For some arbitrary no > and for all s ^ uq, Jq ^ 0, we have 

11/ - E E /^V'.IU^do,!]) < 2^^°1/IIbj,^([o.i])- (2-9) 

• For any Aq C A, 

/ ( E \M^)\'y^'dx ~ Yl ll^-ll^{[o,i])- (2-10) 

• //vr > 1, for any sequence {uu)ueA 

II(Ei^'^'^'^i^)^^^IIl-([o,i])~ iiE^'''^^ii^"([0'i])- (2.11) 

i/eA ugA 

The symbol ~ means inequality in both ways, up to a constant depending 
on vr only. The property (2.9) reflects that our definition (2.8) of Besov 
spaces matches the definition in term of linear approximation. Property 
(2.11) means an unconditional basis property and (2.10) is referred to as 
a superconcentration inequality see [28] . The existence of compactly sup- 
ported wavelet bases satisfying Assumption 2.8 goes back to Daubechies 
and is discussed for instance in [9]. 

We are interested in the case where a^ may belong to various smooth- 
ness classes, that include the case where cr^ is deterministic and has as 
many derivatives as one wishes, but also the case of genuinely random 
processes that oscillate like diffusions, or fractional diffusions and so on. 
These smoothness properties are usually modelled in terms of Besov balls 

^;oo(c):={/:[0,l]^M, ||/||bj,^([o,i]) < c}, c> 0. (2.12) 

11 



that measure smoothness of degree s > l/vr in L'^ over the interval [0, 1], 
for vr G (0, oo). The restriction s > l/vr ensures that the functions in B^ ^ 
are continuously embedded into Holder continuous functions with index 
s — l/vr. Besov balls also give a flexible way to describe the smoothness 
of the path of a continuous random process. For instance, if (cj) is an Ito 
continuous semimartingale itself with regular coefficients, we have 

P [(T^ G Bl(^{c)] > 0, for every n > 1/2, 

If it is a smooth transformation of a fractional Brownian motion with 
Hurst index, H, we have P [cr^ G B^^^{c)] > for vr > i/ likewise. The 
proof of such classical results can be found in Ciesielski et al. [8]. 



2.3 Achievable estimation error bounds 

For prescribed smoothness classes of the form T) = B^ ^{c) and L^-loss 
functions, the rate of convergence Vn depends on the index s, vr and p. 
Define the rate exponent 

a(.,p,.)=mm|^-^,^^-^— ^|. (2.13) 

Theorem 2.9. Work under Assumptions 2.1 and 2.2. Then, for every 
c > 0, the rate n""^'*'^''^)/^ is achievable over the class B^ ^{c) in LP- 
norm with p £ [1, oo), provided s > l/vr and vr G (0, oo), up to logarithmic 
corrections. 

Moreover, under Assumption 2.8, the estimator explicitly constructed 
in Section 3.3 below attains this bound in the sense of (2.5), up to loga- 
rithmic corrections. 

Remark 2.10. A (technical) restriction is that we assume s > l/vr, a 
condition that guarantees some minimal Holder smoothness for the path 
oft-^a^. 

Remark 2.11. The parametric rate n~^'^ (formally obtained when let- 
ting s — )■ oo m the definition ofa{s,p, -k)) has to be replaced by n"^'^. This 
effect is due to micro structure noise, and was already identified in earlier 
parametric models as in Gloter and Jacod [20] and subsequent works, both 
in parametric, semiparametric and nonparametric estimation, as follows 
from [20, 21, 7, 30, 38, 26] among others. 

Our next result shows that this rate is nearly optimal in many cases. 

12 



Theorem 2.12. In the same setting as in Theorem 2.9, assume more- 
over that s — l/vr > ^^^ ■ Then the rate n""'-**'^''^"^ is a lower rate of 
convergence over B^ ^{c) in U" in the sense of Definition 2.6. 

Since the upper and lower bound agree up to some (inessential) loga- 
rithmic corrections, our result is nearly optimal in the sense of Definitions 
2.4 and 2.6. 

The proof of the lower bound is an application of a recent result of 
ReiB [33] about asymptotic equivalence between the statistical model ob- 
tained by letting a^ be deterministic and the microstructure noise white 
Gaussian with an appropriate infinite dimensional Gaussian shift experi- 
ment. In particular, the restriction s — l/vr > ~^^ stems from the result 
of ReiB and could presumably be improved. Our proof relies on the fol- 
lowing strategy: we transfer the lower bound into a Bayesian estimation 
problem by constructing P adequately. We then use the asymptotic equiv- 
alence result of ReiB in order to approximate the conditional law of the 
data given a under P by a classical Gaussian shift experiment, thanks to 
a Markov kernel. In the special case p = tt = 2, we could also derive the 
result by using the lower bound in [30]. Also, this setting may also enable 
to retrieve the standard minimax framework when a^ is deterministic and 
belongs to a Besov ball B^^{c). In that case, it suffices to construct a 

probability measure P such that under P, the random variable a^ has 
distribution //(da^) with support in B^^{c), and is chosen to be a least 
favourable prior as in standard lower bound nonparametric techniques. 
It remains to check that Assumptions 2.1 and 2.2 are satisfied /i-almost 
surely. We elaborate on this approach in the proof of Theorem 2.12 below. 



3 Wavelet estimation and pre-averaging 

3.1 Estimating linear functionals 

We estimate a"^ via linear functionals of the form 

{a\h,k):= [ 2'l^h{2h-k)d{X)t. 
Jo 

With no possible confusion, we denote by (•,•) the inner product of 
L2([0,1]) and by 



^2 
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the quadratic variation of the continuous semimartingale X. Here, the 
integers £ > and k are respectively a resolution level and a location 
parameter. The test function /i : R — )• M is smooth and throughout the 
paper we will assume that h is compactly supported on [0, 1]. Thus, h^^ = 
2^/^h{2^» — k) is essentially located around {k + \)/2^ . 

Definition 3.1. We say that A : [0, 2) — )• M is a pre-averaging function 
if it is piecewise Lipschitz continuous, satisfies X{t) = — A(2 — t), and is 
not zero identically. To each pre-averaging function A we associate the 
quantity 

J:= (2 f { I \{u)dufds\'''^ 

and define the (normalized) pre-averaging function A := A/A. 

For \ < m < n and a sequence {Yjn,j = 0, ...,n), we define the 
pre-averaging of Y at scale m relative to A by setting for i = 2, . . . ,m 



Yi,miX):=- Yl M"^|-(^-2))^.,n, (3.1) 



n \ m 'mi 



the summation being taken w.r.t. the index j. If Yj^^ has the form Yj/^ 
for some underlying continuous time process t -^ Yf, the pre-averaging 
of Y at scale m, is a kind of local average that mimics the behaviour of 
Yi/m - Y[i_2)im. Indeed, using A(t) = -A(2 - t), for t G (0, 1], 



m 



^i,m(A) ~ 2_^ ^\J^-fi}\Yilra-Jln~Y(i-2)lm^Jln)- 

n \ 'mi 

Thus, Yi^rnW might be interpreted as a sum of differences in the interval 
[{i — 2)/m,i/m], weighted by A. 

Prom (1.5), a first guess for estimating (o"^,^£fc) is to consider the 
quantity 



E 

i=2 



^'^k{'-^)Zi 



for some intermediate scale m that needs to be tuned with n and that 
reduces the effect of the noise (ej,n) hi the representation (1.1). However, 
such a procedure is biased and a further correction is needed. To that 
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end, we introduce 

t 
2n 



b(A,Z.)i,m:=^ Y. A2(m^-(i-2))(Z,,„-Z,_i,„)' (3.2) 



n \ m 'mj 



In order to get a first intuition, note that (Zj^„ — Zj_i^„)^ w (ej^„ — ej_i^„)^. 
Further stochastic approximations, detailed in the proof in Section 4.1, 
show that subtracting b(A,Z,)j^m corrects in a natural way for the bias 
induced by the additive microstructure noise. 

Finally, our estimator of (<T^,/i£fc) is 



£^{hik) ■.= Y.h,k{^)[zl^-b{\Z,\J^. (3.3) 



3.2 The wavelet threshold estimator 

Let (c/p, tp) denote a pair of scaling function and mother wavelet that 
generate a wavelet basis {ipu)u satisfying Assumption 2.8. The random 
function t ~-^ a^ taken path-by-path as an element of L'^{[0, 1]) has for 
every non-negative integer Iq an almost-sure representation 

0-, = X] ^i^ok'Peoki*) + X] X] d^ki^eki*), (3.4) 

fcGAfg e>eo k&Ai 

with Ci,,k = {cr'^,ipiok) = Jq 'Pe.ok{t)d{X)t and d^k = {(r^,i'ek) = 
/o 'Pek{t)d{X)t. For every £ >0, the index set A^ has cardinality 2^ (and 
also incorporates boundary terms in the first part of the expansion that 
we choose not to distinguish in the notation from (fi^k for simplicity.) 
The choice of io in (3.4) determines the representation of a^ as sum of 
a low resolution approximation based on the scaling function ip and a 
high-frequency wavelet decomposition, Section 2.2. Following the stan- 
dard wavelet threshold algorithm (see for instance [15] and in its more 
condensed form [28]), we approximate Formula (3.4) by 

^li») ■■= Y Si¥^iok)'^eoki»)+ X Y'Tr[^i'^ek)]ipiki») (3.5) 

where the wavelet coefficient estimates £{ipif^k) and £{tpik) are given by 
(3.3) and 

Tr[x]= Xl^l^ly^y, T > 0, X G M 
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is the standard hard-threshold operator. Thus t ~^ S'^(i) is specified by 
the resolution levels io, ii, the threshold r and the estimators £{ipi^k) 
and £{'ipik) which in turn are entirely determined by the choice of the 
pre-averaging function A and the pre-averaging resolution level m. (And 
of course, the choice of the basis generated by {ip, ip) on -^^^([0, 1]).) 

3.3 Convergence rates 

We first give two results on the properties of £m{he,k) for estimating 

Theorem 3.2 (Moment bounds). Work under Assumptions 2.1 and 2.2. 
Let us assume that h admits a piecewise Lipschitz derivative and that 

If s > l/vr, for any c > 0, for every p > 1, we have 
E [\£mihik) - (a^/^^fe)|^I|^2ges_^(^)}] < m"^/^ 

^ III' r'^fcli,™' 

where \h£k\i,m '■= "^"^ X^i^i |^^fc(V™')l- ^^^ symbol < means up to a 
constant that does not depend on m and n. 

Theorem 3.3 (Deviation bounds). Work under Assumptions 2.1 and 
2.2. Let us assume that h admits a piecewise Lipschitz derivative and 
that 2^ <m < n^'^. If moreover 

m2~ > mfl , for some q > 0, 
then, if s > l/vr, for any c > 0, for every p > 1, we have 

\Sm{h,k) - {a^hek)\ > K{S^f' , a' e ^^^(c) 
provided 

K > 4(^)^/^('c + \/2l ||a||Loo||A||i2A~"^ + ||a||ioc||A|||2A 

and 

m~^''~'/^^\hek\i,m<m~'/', 

where c := sup^2gB._^(c) ||o-2||loo. 
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Theorem 3.4. Work under Assumptions 2.1, 2.2 and 2.8. Let a'^ denote 
the wavelet estimator defined in (3.5), constructed from {(p,ip) and a pre- 
averaging function A, such that 

m ~ nl/^ 2^« ~ mi-2"« for some < oq < 1/2, 2^^ ~ ^i/(i+2ao) 



and T := kw °^ for sufficiently large k > 0. Then, for 
ao + l/vr < s < max{ao/(l — 2ao),no}, 

i/ie estimator a^ achieves (2.5) ower 2? = i3^oQ(c) wii/i u„ = n""^'''^''^^'^ 
u]j to logarithmic factors. As a consequence, we have Theorem 2.9. 

Proof. Thanks to Theorems 3.2 and 3.3, Theorem 3.4 is now a conse- 
quence of the general theory of wavelet threshold estimators, as developed 
by Kerkyacharian and Picard [28]. To that end, it suffices to obtain ap- 
propriate moment bounds and large deviation inequalities for estimators 
of wavelet coefficients in wavelet bases satisfying Assumption 2.8. 



More precisely, by assumption, we have s— l/vr > ag and 2 " 



~ m 



l-2ao 



therefore, the term m "'"^i'^ ^'^'^'\hik\i.m is less than a constant times 



r\^ 



where we used that \h£k\i,m ^ 2 ^'^ with h = if. This together with 
Theorem 3.2 shows that we have the moment bound 



IE ilSmiipeok) - {o- ,feok)Wa^eBi^^(c)}] < m 



n 



so that Condition (5.1) of Theorem 5.1 in Kerkyacharian and Picard [28] 
is satisfied with c{n) = (log n/n)^'^ and A(n) = n^'^ with the notation 
of [28]. In the same way, by Theorem 3.3, with h = ip, for every p > 1, 
we obtain, for a large enough k the deviation bound 



\£m{^ik) - {a\ij,k)\ > k{J^Y'' , a^ G e;^(c) 



< m-P < n-P/"^ 



and therefore Condition (5.2) of Theorem 5.1 in [28] is satisfied with 
the same specification. This is all that is required to apply the wavelet 
threshold algorithm: by Corollary 5.2 and Theorem 6.1 of [28] we obtain 
(2.5) hence Theorem 2.9. D 

Remark 3.5. By taking oq < 1/2, Theorem 3.4 shows that in this case 
the estimator can at most adapt to the correct smoothness within the range 
ao + l/vr < s < ao/(l — 2ao) < oo. 
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4 Proofs 

4.1 Proof of Theorem 3.2 

We shall first introduce several auxiliary estimates which rely on clas- 
sical techniques of discretization of random processes. Unless otherwise 
specified, L^ abbreviates -^^([0, 1]) and likewise for L°° . 

If 5 : [0, 1] — 7- M is piecewise continuously differentiable, we define for 
n > 1 

E/ ilY^aO- / 9{u)dufds) , (4.1) 

and 

In the following, if P is a function class, we will sometimes write Ex)[»] 
for E[» Ig.2g-p]. Clearly, if Pi C T)^^ we have for non-negative integrands 
IE©iW < ^vM- For c> 0, let 

^oo(c):={/:[0,l]^M, ||/||l.o < c}. 

Throughout the remaining part of this paper, we extend pre-averaging 
functions to the real hue by A(t) = for ah t E M \ [0, 2). 

Preliminaries : some estimates for the latent price X 

Lemma 4.1 (Discretisation effect). Lei g : [0, 1] — )■ M, &e a deterministic 
function with piecewise continuous derivative, such that g[l) = 0. Work 
under Assumption 2.1. For every p>l and c > 0, we have 

n „i 

Proof. By Assumption 2.1, using (2.4) and anticipating that rates of con- 
vergence are in power of n, we may (and will) assume that X is a local 
martingale and take subsequently 6 = 0. Next, by Cauchy-Schwarz, we 
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split the error term into a constant times 1x11 + III x //, with 

n /"^ 2p-|i/2 

/ := E^^(,) / g{s)dXs 

LI JO -I 

n " r^ 

LI ,=1 ^0 



<i + n. 



Define the stopping time 

Tc := inf{s > 0, cr^ > c} A 1. 
On {(T^ G ^oo(c)}, we have Tc = 1, thus 



E- 



V^(c) 



g{s)dX, 



2p 



E 



< E 



g{s)dXs 



2p 



K^eVocic) 



Tc 



g{s)dX, 



2p 



By Burkholder-Davis-Gundy inequahty (later abbreviated by BDG, for a 
reference see [27], p. 166), we have 



/<E 



g{s)dX, 



|2p 



1/2 



<E 



Tc 



g^{s)alds 



1/2 



^ II IIP 



where we used that crl < c for s <Tc. For the term //, note first that if 



n n 



9is) ■■=Y.iTlY.9'ii))hj~i)/n,j/n){s), SG [0,1], 

j=l l=j 



the process St = Jq " {g{s) + g{s))dXs, t £ [0, 1] is a martingale and 



" rj/n / " 



g{u)du\ \s<Tc}d{X), 



By summation by parts, we derive 



u 
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We further need some analytical properties of pre-averaging functions. 
In the following A, and A always denote a pre-averaging function and its 
normalized version (in the sense of Definition 3.1). We set 



A(s) := / X{u)du I[o,2] is) 



and 



(4.2) 



Ms) ■={{f Ku)duf + ( / ' Ku)duf) "^ I[o,i] {s) . (4.3) 

Note that for i = 2, . . . ,m 

||A(m.- (i-2))||i2[o,i] =m"^/^||A||i2[o,2] 

and 

||A(m.- (i-l))||L2[o,i] ="1"^/^ 

Lemma 4.2. For m < n, we have 

mn[A{m.-{i-2))] <n~^ 
and for i = 2, . . . ,m 

||A(m.- (i- 2)) 11^2 =m-^/2^ 

Proof. Recall the definition of 9^„ given in (4.1) and let 

j*(r) := max{j : j/n < r/m}. (4.4) 

Since A is bounded, we have 

max sup - > X(m- — (i — 2)) — / X(mu — (i — 2))du 



-\ m ' m] *^ L n 'nj 



< max 



sup 



n \ m 'mi si^l ^ ,^J 



(j-l)/n. 



+ 



A(mn — {i — 2))du 
max > — Afm (i — 2)) — / Afmu — (i — 2))(in 



ie( 



in(*) 

E 

m 'mi ' J 



< n-\ 
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whence the first part of the lemma. For the second part, we have to prove 
that 



I^IIl2[0,2] — 1- 



This readily follows from 

fl /.2 



\A\ 



L2[o,2] — I {I ^{u)du) ds + ( / X{u)du) ds 
( / X{u)dufds+ / ( / X{u)dufds 

Jo Jo Jl+s 

^ rs ^ /•! /•2 ^ 

( / X{u)du) ds + ( / X{u)du) ds = ||A||^2ro ii- 



JO 



Ji 



D 



Lemma 4.3. Work under Assumption 2.1 and let A as in (4-.2) with X 
as in Definition 3.1. Then, for m < n, every p > 1 and c > 0, we have 



E- 



(c)[E5(l^)(/ A(ms-(i-2))(iX 
/ Y,g{^)K\ms-{i-2))d{X) 

■JO TTr, 



<|b||i^|supp(ff)r/2m-P/2, 



where |supp((7)| denotes the support length of g. 

Proof. In the same way as for Lemma 4.1, we may (and will) assume that 
X is a local martingale. For i = 2, . . . ,m and t G [0, 1], set 



Ht,^ := g(^)A(mt - (i - 2)) / A{ms - {i - 2))dX, 1.^.2 n (*)• 

J{i~2)/m. \ m 'm\ 

(4.5) 



For a continuous semimartingale M starting at zero, we have the integra- 
tion by parts formula M^ = (M) + 2 / MdM. Thus, 



E^(^; 



1=2 



1 n2 

A(ms- (z-2))(iXs 



k^[ms-{i-2))d{X), 



"^ I'ijra 

2J2 Ht,idXt. 

1=2 •'{«-2)/m 



(4.6) 



21 



For t G [0,1], the process Y11^2^t,i ^^ continuous (because of A(0) = 
A(2) = 0) and adapted, hence L ^^2 ^s,i dXg is a continuous local 
martingale. Applying BDG and the localisation argument of Lemma 4.1, 
we obtain 



E. 



'» 00(c) 



[l / ^y^Ht,^dXt\ 

^0 i=2 



^E[| / ^(^i/,,)^din <E[| / ^Y.^idt]"'"] 

■^0 i=2 J^ i=2 



<¥.[\m-^Y.^Hl 



n2|P/2i 



\snM9)Y''^-^rn-^Y.^[{Hm, 



i=2 



i=2 



where H^ := sup^<'p |-fft,i| and where we used that t ~-> Ht^i has compact 
support with length of order m~^. The last estimate followed by Holder 
inequality. By BDG again, we derive 

r-((J-2)/m+t)ATc 



E[iH:r]<\g{^i^)\'E 



<\q('^)\''E 



^^ I " V m. 



sup 

t<2/m 
Tc 



(j-2)/mATc 



A(ms- (i-2))(iX, 



{i-2)/m/\Tc 



h?[ms- {i- 2)) aids 



p/2- 



< 



k(^)| 



l\\Prr,-P/^ 



m 



The result follows. 



(4.7) 

n 



Lemma 4.4. Work under Assumption 2.1. Let B% ^{c) denote a Besov 
hall with s > l/vr and c > 0. 



In the same setting as in Lemma 4-3, for every p > 1, we have 



E, 



^' ^=2 J^ ^ 



■SJ,oo{c 



^ |yll,m'"' ^ |y|va 

where 



,?n 



\g{())+g{l)\+Y. ^^P l5(t)- 5(^)1- (4- 

j^j^ s,te[(i-l)/m,j/m] 



Proof. Recall from Section 3.1 that 



X,„.(A):=^ E AK-(.-2))X 



j/n- 



ie(i=2 n 

n V m 'mj 
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Since s > l/vr, the class B^ ^{c) C Poo(c') for some c' = c'(s,7r, c) 
Therefore, by Lemma 4.1, we have 



E 



'Bj.oc(c) 



X 



{ms-{i-2))dX^ " < m^^/^j^-P (4.9) 



since 



by Lemma 4.2, ||A(?tt,» — {i — 2))||j;,2 
inequahty it folfows 



A( 



9^[A(m.-(i-2))] <n~i 

?n~^'^ and m < n. By Holder 



JEss,.(c)[|E^(l^)<"^-E5(^)(/, A(m.-(z-2))dX. 
<|supp(g)rimP-i 



j=2 



xE 



■Bloo(c) 






i=2 



Xt^- i / A(ms-(i-2))dX 

<|b||^^mP/2n-P|supp(g)r, 

(4.10) 



which can be further bounded by ||(7||^oo''7^ ^''^\supp{g)\'P''^. By Lemma 
4.3, we have 



Ebs,^(c)[|E5(i^)(/ A(m.-(i-2))dX 
- / E5(^)^'("^^ - (^ - 2))^'^^ 1 ^ lbllioom-P/2|supp(5)r/^ 



j=2 

therefore by the triangle inequality 

L] 



E 



Bj,oo(c) 



[|E^(^) 






j=2 



E^(^)^^ 



, ms — u 



2))<T2ds 



i=2 



(4.11) 
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We are going to force the function A in (4.11). To this end, note that 

m 

i=2 
m 

•_i ^ ^ \ m 'ml 

m 

+ E (5(^) - 9{iS)^'{ms - (z - 2))!.^ n [s) 

■ - ^ ' \ m 'mJ 



i=l 



-5 



(0)A2 [ms + 1)1. ^n (s) - g{l)K^ {ms - {m - l))l/, i ^i {s). (4.12) 

Moreover, because of A(m) = — A(2 — u), we have A^(n) = A^(2 — u) and 
also A(0) = 0, 

rl — {ms — {i—l)) ^ 

K\ms-{i-2)) = { \[u)du)\ for.G(^,;^], 



h?{ms- {i-l)) 



ms— (i— 1) 



M")rf^) . for^e(^,^] 



< blvar.m"! ^ (4.14) 



This gives for s G (^j ;^] > and A as in (4.3) 

A^ [ms -{i- 1)) = A^ [ms - (i - 2)) + A^ [ms - (i - 1)) , (4.13) 

and otherwise. From (4.12) it follows that on the event a'^ G B^ ^{c) 
„i m 

■^0 i=2 

Finally, we have for a^ G B^ ^^(c) using ||A||^2 = 1 
/■I "^ 
/ E5(^) (^'("^^ - (^ - 1)) - hi=lM ^'0^'^' 

■^0 1=2 
„1 m 

Jo •_2 \ m 'mi 



< jn"™™''^'^^"'^/'^'"^^ 



91,' 



(4.15) 
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the last estimate coming from the Sobolev embedding B'^ ^ C B^o^co 
which contains Holder continuous functions of smoothness min{s — 
l/vr,!}. Since for cj2 e B'^^^{c) 

„1 m 

the conclusion follows by combining (4.11), (4.14), (4.15) and (4.16). D 





/■I 


_i i 1 {s)alds - 

m ' m. 


/ gisWsds 
Jo 



^9{:k)^(t:i ±-]{s)(r'^ds - / g{s)a'^ds <m ^Is^lvar.m, (4.16) 



Preliminaries: some estimates for the microstructure noise e 

We need some notation. Remember from (1.1) that we observe 

Zj^ri = Xj/n + a{j/n, Xj/n)r]j^n, j = 0, . . . , n 

where the intensity of microstructure noise process as := a{s,Xs) and 
noise innovations rjj^n satisfy Assumption 2.2. For a pre-aver aging function 
A, recall from (3.1) that we define 

ei,m := ei,m(A) := — ^ A(m| - (i - 2))ej,„, i = 2,...,m. 



n 

lg(i^,Al 

n \ m 'mi 



(4.17) 



Moreover, we will make several times use of Rosenthal's inequality for 
martingales (see [22], p. 23). It states that for an (J"fc)fc-martingale {Mk)k 
and for p > 0, there exists a universal constant Cp only depending on p, 
such that 



E 



max \Mk\^ 

-k=l 71 



n-1 



< Cp(e [( J^E [(Mfc+1 - Mkf\Tk]y^^] +E [maxlMfc - Mk-i\ 

fc=0 

For our proofs it will be sufficient to bound the maximum in the second 
term on the r.h.s. by the sum X^^^x • 

Lemma 4.5. Work under Assumption 2.1 and 2.2. Let Q denote the a- 
field generated by {Xs,s G [0, 1]). For every function (7 : [0, 1] — t- M and 
p > 1, we have 



E 



m 



i=l 



r^ \y\2,m"'' "' ^ \y\p,m"'' '' 
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Proof. In the following, we will decompose the sum in the previous in- 
equality in an even and odd part. This allows us to treat sums of preav- 
eraged values computed over disjoint intervals. In a first step, let us in- 
troduce the filtrations 

Jc--<=" := (j{T]j^ri ■■ j/n < 2r/m) ® a{Xs : s < 2r/m), 

j;°dd — fj(^r]j^n ■■ j/n < (2r + l)/m) a{X, : s < (2r + l)/m). 



Straightforward calculations show that the partial sums Sl^ 
and 3°/" := ^[^^ f/gi+i with 



even 

r 



U^■■=9m(e 



.2 
-i,m 



i E A^K-(-2))a|/„ 

n \ m 'm\ 

form martingale schemes (i = 1, . . . ,r < [r7T,/2j) with respect to J-"""""" 
and T°'^'^ respectively. Intuitively, li^rn = Op{'m}/'^ /■n}/'^) by (4.17). More 
precisely using Rosenthal's inequality, we have, for every p > 1 



E 



-i,m n^ 



^ J?[mi-{^-2))al^ 



I \ m 'm\ 



,0 ^ fcl.m T^ K* Too U Too "t II' ^, "t ll' 1 



using ||a||Loo < 1. It follows that 



^ m\n <|5l^)rm-n 



m 



(4.18) 



Analogous computations show that 



E [t/|, I JTIT] < 5^ 



2/2^-1^ 



]E[4,™l-^IT]</(^) 



m n 



Therefore, applying Rosenthal's inequality again, we obtain 



even IP] < |^|P ^^3p/2^-p ^ |g|P ^mP+^n-P. 



'Lm./2JI 



Likewise, we obtain the same estimate for E [|5'i'A^_]^w2| 1^1 • The conclu- 
sion follows. D 

Lemma 4.6. In the same setting as in Lemma 4-5, we have, for every 
c > and p > 1 

m 

i=l 
-IP /'^-P/2 



< lor In 



m + m 



3p/2+1 -3p/2 



n 



+ \9\ 



2,m 



P/2„-p/2 



m'^' n 



+ m ' n 



2p, -3p/2 



)• 
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Proof. By Assumption 2.1 and the same localisation procedure as in the 
proof of Lemma 4.1, up to losing some constant, we may (and will) as- 
sume that X is a local martingale such that \as\ < c almost-surely and 
subsequently work with E[»] instead of E-p^(c)[»]- 

In the same way as for the proof of Lemma 4.5, we define an J^°™"- 
martingale by setting 

r 
i=l 

and proceed for S""^"^ analogously. By Rosenthal's inequality for martin- 
gales and Cauchy-Schwarz, 



[m/2\ 



lE[|5L™/2jr] <m^/'n"^/'E[| ^ 5'(^) E [<^(A) | -^IT 



p/2- 



i=l 



Lm/2J 



E k(^)r(25[l^2.™(A)P^])^/^(lE[|62.™(A)|2^]) 



2pi ^ 1/2 



i=l 



Note that, 



E [\X,,miX)n < E ^ Y. ^("^^ - (^ - 2))(%n - ^(.-2)/,n) 



2P1 



n V m 'mj 



,2p^-2pi 



|2Pl 



+ m'^n^-'^ E[|X(,_2)/™| 'J, 
where we used the fact that, by Riemann's approximation, we have 



E M-^-(^-2)) 



< 1. 



(4.19) 



le(i^,Al 

n \ m 'mi 



-V M\|2P1 



It follows that E [|Xj_m(A)| ^] is less than 



|A||?LE 



sup |X(j_2)/m+s - X{i~2)/ri 
s<2/m 



|2p 



+ m^Pn^^PE\\X, 



(»-2)/mP^] 

(4.20) 



which in turn is of order ||A||^^m, P-|-?7i^*'n "^'P thanks to the localization 
argument for o". In a similar way, we obtain 

E [xl^mW I JTIT] < m-' + m\-^Xf^,_^y^ < m'^ + m^n-^ supX^. 
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Recall that E [lej^m.p^] ^ m^n ^. Putting together these estimates, we 
infer that E [|'S'f^%| I ] satisfies the desired bound. We proceed likewise 
for S?9^^_-^y2\ ■ The conclusion follows. D 



Preliminaries: some estimates for the bias correction b 

We need some notation. Recall the bias correction defined in (3.2) 



b(A,Z.)i,„^:=^ J2 ^'(^|-(^-2))(^i,n-^i-l,n)'. 



n \ m ' m] 

We plan to use the following decomposition 

b(A, Z,)i^rn = b(A, X,)i^rn + b(A, £»)i^rn. + 2c(A, X,, e,)i^rn, 

where 

c(A,X,,e,) 



Jt,m 
-.2 



•= Si Y. ^^("^n-(^-2))(%n-^0-l)/n)(ej,n-ei-l,n). 



n \ m ' m] 



Lemma 4.7. Work under Assumption 2.1 and 2.2. For every p > 1, we 
have 



m 

E "' 



n \ m 'mJ 
~ lyll,m'''' "' ^ \y\2,m'"' ''■ ^ \y\p,m.'"' "- 



Proof. By triangle inequality, we bound the error by a constant times 
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where 



i=2 j 

m 

II ■■=^ [\ E^(^) E^'(-^ - (^ - ^))-Ui^Un - 1) 

i=2 j n 

m 



[mi: - (i- 2))(a^j - a)^^ 



1=2 



n n 



rn 

IV :=E [| Y^gi^i;^) E^'(-^ - (^ - 2))^.-i,n^ 



i=2 



where, as before, the sum m j expands over {j/n G ((i — 2)/?TT,,i/?TT,] |. 

• The terms / and //. We only bound /, the same subsequent arguments 
applying for the term involving r/j-i,n- Let J^j = airik^n '■ k < j) ® (t(Xs : 
s < 1). By Rosenthal's inequality for martingales, 



n m 



^^E Eb(^)l%i.r^^iO^[|(^l,n-i) 



j=l i=2 
n m 



L 71 \ m 'mi 

j—l j_2 L n \ m ' mi S 

<\n\P n-\-\n\P n"^!'^ 

where we used the fact that the functions a and A are bounded. 



p/2 



• The term ///. Recall the definition of j*(r) given in (4.4). Summing 
by parts, we have 

n \ m 'mi 

= - E «0-i)/n P' (-^ - (^ - 2)) - A^ (-¥ - (^ - 2) 

n V m 'mJ 

+ 4»/n^'(-^ - (^ - 2)) - 4(-2)/„A^(-^^ - (^ - 2)). 
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Since a is bounded and A has finite variation, we infer 



Earn E A^K-(^-2))(a?/.-«0-i)/J 



^ I IP p 



i=2 



n \ m 'mi 



• The term IV. We may spUt the sum with respect to j in even and 
odd part. Proceeding as for / and //, we readily obtain 



TV < ln|P nP/^ -I- \n\P n 



U 



Lemma 4.8. In the same setting as in Lemma J^.l, for every c > 0, we 
have 

m, 

PI 



E^^(c)[|E5(^)KA,X.),. 



i=2 



^ bl?,™"!^?! P. 



Proof. In the same way as in the proof of Lemma 4.6, we may (and will) 
assume that X is a local martingale and that \a1\ < c almost surely, 
working subsequently with E[»] instead of Ex)^(c)[»]. We readily obtain 



E 






i=2 



1=2 



{i-2)/m) 



n \ m 'mi 



< \n\P rnPn~P 



where we bound \Xj/n — X{i-2)/m\ by the supremum over |Xj,_|_(j_2)/m — 
X(i-2)/m\i s < 2/m and argue as in (4.20). D 

Let M be a continuous, locally square integrable J^-martingale and H 
some progressively measurable process. Then, for < s < t < 1 



E 






E 



Hld{M)u\Fs 



provided that E [ Jq H'^d{M)^ < oo. This fact will be referred to in the 
sequel as conditional Ito-isometry (cf. [27], Section 3.2 B). 
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Lemma 4.9. In the same setting as in Lemma J^.l, for every c > 0, we 
have 

m 



E- 



■»oo(c) 



[|E5(^)^(A,^.,e.k. 



1=2 



IP r,.-P/2+l , ™"P/2+lM^2p„-2p 



Proof. As in Lemmas 4.6 and 4.8, we may (and will) assume that X is 
a local martingale and that |(T^| < c almost surely, working subsequently 
with E[»] instead of Ex)^(c)[»]. It suffices then to bound 



E 



[\ Y.^{'^) E ^'(-^ - (^ - 2))(%n - ^0-l)/n)^.,n 



i=l 



n \ rra ' rraJ 



Recall that j*(r) = max{j : j/n < r/m} and let us introduce the 
filtrations 

or" ■■= <^{Vj,n ■■ j/n < 2r/m) ® a{Xs : s < Jl{2r)/n), 
gf" := a{r,j^n ■ j/n < (2r + l)/m) » a(X, : s < Jl{2r + l)/n). 
The process 

r 

Sr- := E5(^) E ^'("^^ - (2^ - 2))(%n - ^0-l)/n)6,,n 

i=l j / 2i-2 2J1 

n V ?n 'mJ 

is a ^""^'-martingale and likewise for S""^"^ defined similarly w.r.t. the 
filtration Q°'^'^. Moreover, on one hand 



E 



'(^) E ^'("^^ - (* - 2))(^.7n - ^0-l)/n)6 



71 \ m 'mJ 



n \ m ' m J 

and on the other hand by conditional Ito-isometry 



E 



(5(^) E ^'("^^ - (2^ - 2))(^./n - ^0-l)/n)6,. 



/->cvcn 



J / 2i-2 2^1 
n \ rn 'mi 



j ( 2i-2 2*1 
n \ m ' mi 
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Therefore, by Rosenthal's inequahty for martingales, we infer 

ip. n ccvon IP] < I |p / -p/2+1 . ^-p/2+l^ I \n\P 

We proceed likewise for S?f^_^^^i2\ and the conclusion follows by incorpo- 
rating the multiplicative term m?Pn~'^P in front of the two error terms. D 



Completion of proof of Theorem 3.2 

Since 

m 

i=2 

we plan to use the following decomposition 

£m{hik) - (^', hik)L^ =1 + 11 + III, (4.21) 



with 



1=2 
m 

1=2 
m 

III '■= ^/ ^^ffc(^)^i,mgi,m- 



i=2 

The term /. By Lemma 4.4, we have 



^B^^ic)[\in<\\h,k\\l^m-P^'\snpp{h 



P/2|«„T.r,/'^., ^|P/2 



ik 



'\'''ik\lm I I'l'eklv; 



var,m'''' 



Note that ||/itt-||L- < 2^/^\\h\\L-o and |supp(/i^fc)|P/2 < 2-^^/2. By assump- 
tion, h has a piecewise Lipschitz derivative. With (4.8), we conclude 

\hekUr,m < m^/\ (4.22) 

Thus, the term / has the right order. 

• The term //. Applying successively Lemmas 4.5, 4.7, 4.8 and 4.9, we 
derive using m < ?i^" 

Ebj,^ [l^^r] < \h,k\lm^nPn~P + \h,k\lmr^t'^'^n-P + \h,^,\l^^mP+^n-P. 
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Since for 1 < p < 2, by Jensen's inequality E^j ^[|//|p] < E^j ^[|//p]P/^ 
and for p > 2, \hik\p,mmP+'^n-P < 2^ip/2-i)mP+in^P < m^P/'^n~P, this 
term also has the right order. 



• The term ///. Finally, by Lemma 4.6, we have 

r p- 

Ebj,^{c) [ III 

< \h,k\U{n-''/'m + m'P/'^'n-'P/') + \h,\l^{mP/'n-P/' + m'Pn-^P/'), 



which also has the right order by the same argument as above. The proof 
of Theorem 3.2 is complete. 

4.2 Proof of Theorem 3.3 

4.2.1 Preliminary: a martingale deviation inequality 

If (Mfc) is a locally square integrable /"^-martingale with Mq = 0, we 
denote by [M]k = EiLi(AMj)2 with AMj = Mi - Mi_i its quadratic 
variation and by (M)^ = Y^^^i E [(AMj)^ | -Fj_i] its predictable compen- 
sator. We will heavily rely on the following result of Bercu and Touati 

[6]. 

Theorem 4.10 (Bercu and Touati [6]). Let (M^) be a locally square 
integrable martingale. Then, for all x,y > 0, we have 

2 

F [\Mk\ > X, [M]fc + {M)k < y] < 2exp ( - ^ 

Prom Theorem 4.10, we infer the following estimate 

Lemma 4.11. Let (Mj) be a locally square integrable J^j -martingale. Sup- 
pose that for p > 1 there is some deterministic sequence {Cj)j (with 
j = j(m)) andS>0 such that F[{M)j > Cj{l + 5)] < m'P . If further for 
every k > 2 

max E HAMil'^l < 1, (4.23) 



then. 



\Mj\ >2{l + 5)yJCjp\og 



m 



< m-P 



provided mf^^ < j < rn for some < go ^ 1 o,nd there is an e > such 
thatCj>j'/^+'. 
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Proof. We have by Theorem 4.10 



< 2m-P + P [[A/], + {M)j > y, {M)j < Cj{l + 6)]+F [{M)j > Cj(l + S)] , 

with y = 2Cj{l + 26). Further we obtain 

P [[M]j + {M)j > y, {M).j < C,(l + 6)]<F [[M]j - (M), > 2C,d] . 

Since ([M]^ — {M)j) is a J-"j-martingale it fohows by Chebycheff's and 
Rosenthal's inequahty for martingales and k > 2 

P [[M], - {M), > 2C,S] < C-^E [\[M], - (M),f ^ 

j J 

< Cr-^E\AMif^ + Cr-E\Y,^ [(AM)^|J-,_i] 

where we used Holder's inequality 



k/2 



i=l 



J 



E^E[(AM)f|J-, 



i-l 



k/2 






< r/^^1 ^E [e (|AM,|2-| J-,_i)] < J-/2. 



Choosing k := Qq pe > 2, we finally obtain 

P [[M], + (M), > y, (Af ),• < Cjil + 5)] < j-P/'^" < m-P. 



D 



Lemma 4.12. Work under the assumptions of Theorem 3.3 and suppose 
that X has no drift, i.e. 6 = 0. If c = c{s,tt,c) is such that B^ ^ (c) C 
^oo (c) then, we have for every fixed 6 > 



Y^h,k{'^)xi^{X)-{a^h 



i=2 



>4c(l + (5) 



plogm 



(.k)L2 



and a^ G i3^ „ (c) 



<m-P, 



provided 



m 



<'-'/-^\h^,h,rn<m-'/'. 
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Proof. Recall that A (s) = J A (u) du and let Ht^i be defined as in (4.5), 
where g is replaced by h^^. Using the integration by parts formula (4.6) 
we bound the probability by / + // + ///, with 

/ :=P \\Y^h,,{^){xl^{\) - ( / K{ms-{^-2))dX,f) 
> c<5/^ and a' G B^^ (c 



/I: 



>2c(l + i)V^^anda2GP^(c) 
m:=P V/i,fc(^)( / A2(ms-(i-2))a2cis-(a^/^,fc>i2) 



> c5 



plogm 



and a^ G B^ (c) 



Note that P [X > t and S] = E [l{x>t}ns] < i~PIE [X^ Is] , for p > 0. 
Using m < n^/"^ and (4.10) we find that I can be bounded by any poly- 
nomial order of 1/m. 

The term II can be bounded further by II < //even + //odd) with 



II 



even / odd • 



m 

E 



//t.id^t 







>c(l + i) 



plogm, 
m 



i=2, i even /odd ' 

Since /i has support [0, 1], /i£fc(^^^) 7^ can happen only if 
i(/c2^^m + l)<i<liik + 1)2~V + 1). 



(4.24) 



We will treat the term //even only, since similar arguments apply for //odd- 
The process Mr := 2~^'^m Yll=i /n " Ht^2idXt is a martingale with respect 
to the filtration J> = a (Xg : s < 2r/m) starting at Mi (i.2-em+i)/2\ ~ ^• 
Recall that Ht^2i vanishes outside [2{i — l)/m, 2i/m] and I{Tc<(2j-2)/m} is 
J-i-i measurable. Moreover, uniformly in k,i, we obtain 



[m/2\ 



- E ^'k (^) = \\hi,k\\l + 0{2'/m) = 1 + 0{m- 



(4.25) 



i=l 
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Therefore, Lemma 4.2 and conditional Ito-isometry yield 

Lm/2J 



^ \l{{k+l)2-tm+ 






Lm/2J 

< 2^^-'c' E /.i (^) < 2^'m \Z^ (1 + f ) , 



where the last inequality follows for all in > mo{S) and mo{6) is fixed and 
independent ol i,k. Furthermore, by BDG and (4.7), we bound 



L Jo 










sup |^(t+2{i-l)/m)AT-,2J 
t<2/m 



m / I '^ 



uniformly over i. Since the number of integers i for which (4.24) holds is 



of order m2 ^, we may apply Lemma 4.11 for j ~ m2 ^, Cj = 2 ^mjc? 



and obtain //even ^ m"^. 

In the same way we bound //odd a-nd thus obtain // < ?n,~^. 

In order to bound /// it follows from m^^^~^''^'\h£f^\i^jyi ^ w,^^'^, (4.14), 
(4.15), (4.16), and (4.22), that for sufficiently large m on ct^ E B^^,^ (c) 

E h^k (^) (yi A' (ms - (i - 2)) aids - {a\ h,k) ^^ 



i=2 



<c6 



plogrre 



This yields the conclusion. 



n 



Lemma 4.13. Work under the assumptions of Theorem 3.3 and suppose 
that X has no drift, i.e. h = 0. Then, we have for every fixed (5 > 



E ^^^ \m) ^i,m (A) ej^m,(A) 



i=2 



> a/Sc llall^^cx. II A 112,2 (1 + 5) 



plogm 



and a^ G 3%^^ (c) 



< m-^ 



where c (s, vr, c) is sitc/i that B^ ^ (c) C foo (c) 
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Proof. Let Xj^m^y- be defined as Xi^m with Xj/„ replaced by Xj/^^^Xc- 
Then by separating even and odd terms it suffices to show 



Z2 ^^^ (^) -^i^-^^T- (A) €i 



i=2, ieven 



> A/2c||a||^tx> ||A||i2 (1 + 5) 



p log rn 



<m~P 



since the same argumentation can be done for the sum 
over odd i. Similar as in the proof of Lemma 4.12, Mr 



n 



1/22 ^/^Xljli^^fe (^^) -'^2i,m,Tj.e2J,m defines a martingale with re- 
spect to the filtration J^^™", starting at Mi Cfc2-^m+iV2| ~ *-*• 



^ \\{{k+l)2~tm+l)\ 



Lm/2J ^ 

^)j <n2-^ E ^i(^)I^[^2i,™,T.4,r.|-^T] 

Lm/2J ^ 



j=i 



^ X\nil-{2i-2)). 



j ( 2i-2 2i-| 



By the assumed piecewise Lipschitz continuity of A it follows 
Y, P(mi-(2^-2)) = ||A||i. + 0(^), 

3 ( 2i-2 2il 
n \ m 'mi 



m 
n 



(4.26) 



uniformly in i. Next, we will derive a bound for E[X2j mT-l-^f-i"]- Note 
that X2i,m,T- = Ui + U2, with 



Ui-.-- 



E ( E ^("^^ - (2^ - 2))) i^l^r- - ^l^.T-.2-2) 



2/21^ 2il l=j 
n \ m ' mi 



-ATcA 



U2:=X 



2i-2 
m 



AT.n 



Y Km^-{2i-2)). 



j ( 2i-2 2J1 
n \ m 'mi 



Clearly, EfXai^^^T-l-^f-f] = K[C/f|J-fZi"] + Ul By conditional Ito- 
isometry 

E[{X 



cE 



n ri n n 



-1 

n 
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where W is a standard Brownian motion and 6jj' denotes the Kro- 
necker delta. Recah the definition of jj^(r) given in (4.4) and define 
Cj := Y.?=jK^^ - (2« - 2)). We can bound 






in(2»-2) 



l+j-(2i-2) n 



+ E 5 
i4 



' 2i-2 2i"| 
m 'mJ 



Cl+j,t(22-2)^j*(2»-2) + 



E ^^(^^ 



i p»-2 2il 
n \ m 'mJ 



n 



w 



n 



cE 



^ A(mi - (2i - 2))W^ 

j ( 2i-2 2il " 

n V m 'mi 



Setting X = W m (4.9) and Lemma 4.2 yield further 

E[[/f |Jj:!<^"] <cE\ f A^ms - {2i - 2))ds\ +0{m~^/\~^) 

= cm^^ + 0(m~^/^n~^) 

uniformly over i. By using (4.19) we infer that there exists a constant cjj 
such that C/| < cu^ sup ^^ji^X"^. Choose 6' < min(l, | min(||A||^2, !))■ 
We find by Chebycheff inequahty that ¥[^U^ > 6'] < m-P. With (4.25), 
we obtain further for the predictable quadratic variation, sufficiently large 
m and probability larger than 1— const. xm,~^ 

' ' ' l^ak+l)2-em+l)\ 



< 2-'-'m 



< 2' 



m \\a 



a\\l^ c(l + Oim-'^)) (llAlli. + 0(f )) (l + f C/| 
l^c{l + 6'){fX\\l+6'){l + S') 



< 2^^^^m\\a\\l^c\\X\\l2{l + S) 
or to state it differently 



' ' ' l^ak+l)2-<^m.+l)\ 



>2-^-^m\\a\\l^c\\X\\l2{l + 5) 



< 



m 



In the next step, we bound maxjE[|AMj|'^]. In the proof of Lemma 4.6, 
we already derived E[|Xj,m(A)P''] < m~'^ and Edej^mP"] < m'^n''^. By 



38 



the same arguments we obtain also E[|Xj^m,rc(-^)P'^] ^ ""^ "• Therefore, 
it is easy to see that 

maxE[|AM,r] < 2-^''^/2n«/2|/i,,(i^)|«EV2[|X,,^^y^(A)|2'^] EV2[|e^^^|2«] < i. 
Hence the assumptions of Lemma 4.11 are satisfied with j ~ m2~^ and 



Ci 



m \\a\ 



|2 - 



cllAII?? and the conclusion follows. 



lL2 



D 



Lemma 4.14. Work under the assumptions of Theorem 3.3. Let Q denote 
the a-field generated by {Xg, s G [0, 1]). Then we have for every fixed 6 > 



rn 

^\\Y.^ik{^){el^{\)-WlmW\G]) 



i=2 



>4||a||ioo||A||i2(l + 5) 



plogm 
m. 



< 



m 



Proof. We show that 

m 



1=2, icvcn 



>2\\a\\l^\\\\\l2{l + 6) 



plogTTt 



< 



m 



and argue similar for the sum over i odd. Let J^J?^"'^, Ui and the martingale 
S^?'^'^ be defined as in the proof of Lemma 4.5 with g replaced by hik- Now 
h(.k{^) 7^ can happen only if i(fc2-^m+l) < i < i((/c+l)2-^m+l). In 
the following we will consider the martingale M,. := iL2~^/25'Gvcn g^^a,rted 



at M 



[(fc2-^m+l)/2j 



0. We obtain 

Lm/2J 



(M) I „ < K2- 



i=\ 



2 ( 2i-\ 



)E 



^2i,m 



wl^,^\Q\f\n-r 



Elementary calculations and (4.26) show further that we may find a de- 
terministic bound, i.e. uniformly in i 



E 



fe? 



2 Hall 



]E[e?,^|a])Vri" 



E ^v^-(^-2)))'+o(f^) 



n \ m ' m] 

|4 ||\||4 ,n("T-^\ 
Il°° 11-^11x2 +'^l7r5"j- 
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From this and (4.25) we obtain for sufficiently large m, 

By (4.18), we infer E[|AMj|'^] < 1. Applying Lemma 4.11 yields the con- 
clusion, n 



Completion of proof of Theorem 3.3 

Let /, //, and /// be defined as in (4.21) and suppose that X has no 
drift. 

• The term /. By Lemma 4.12, we have 



/|>4c(l + <5)V^^anda2Gfi^,^(c) 



• The term //. Applying Lemmas 4.14, 4.7, 4.8 and 4.9, we derive by 
Chebycheff's inequality and \h£k\p,m ^ m^'^"^ , p > 2 



|//| > 4||a||i^ ||A||i.(l + <5)V™^ and a^ G e;^(c) 
The term III. We find by Lemma 4.13 



< m.-P. 



\III\ > 4V2c \\a\\L^ ||A||i2(l + 5) V™^ and a' e B^^^ic) 



< m~P. 



If the drift is non-zero, we can argue by a change of measure as in Lemma 
4.1 and obtain with Assumption 2.1, E^^bi^sJ < E^,o[IbJ^''"^^/^- The 
proof of Theorem 3.3 is complete. 

4.3 Proof of Theorem 2.12 

Preliminaries. Let {C,C) denote the space of continuous functions on 
[0, 1], equipped with the norm of uniform convergence and its Borel a- 
field C. Let (Tl' ,J-'' ,¥') be another probability space rich enough to contain 
an infinite sequence of i.i.d. Gaussian random variables. On (0,J^) := 
{C X C X il.' ,C <Si C (S> J^') we construct a probability measure P as follows. 
Let {a,uj,uj') denote a generic element of il.. 
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We pick an arbitrary probability measure fi{da) on {C,C), and we 
construct the measure ¥„{duj) on (C, C) such that, under ¥„, the canonical 
process X on C is a solution (in a weak sense for instance) to 

Xt = Xo+ [ asdWs, 
Jo 

where VK is a standard Wiener process. We then set 

P := n{da) (g) ¥„{dcj) F'{dcj'). 

This space is rich enough to contain our model: indeed, by construction, 
any ^(da) will be such that, under /i, we have Assumption 2.1. By con- 
structing on (Q' ,J-, P') an i.i.d. Gaussian noise {ej,n) for j = 0, . . . ,n with 
constant variance function a^ > for a given a^ > 0, the space Q, is rich 
enough to contain an additive Gaussian microstructure noise, indepen- 
dent of X, and we have Assumption 2.2. Consider next the statistical 
experiment 

£n= (CxO',C«)-F',(P:j,aGP)), 

where D C C and P" is the law of the data (Zj^n), conditional on a. 
The probability fi{da) can be interpreted as a prior distribution for the 
"true" parameter a. Let us now introduce the statistical experiment £^ 
generated by the observation of the Gaussian measure 

where B is a Gaussian white noise, with same parameter space D, but 
living on a possibly different space 0," . We denote by Q" the law of Yn. 

Completion of proof. Let T> = B^ ^ (c) denote a Besov ball such that 
s — l/vr > 0. Then V <Z C . Assume further that ^ is such that /u[P] = 1. 
Then Condition (2.6) is satisfied. Moreover, for any estimator a„ and any 
c' > 0, we have, by Markov inequality 

^/E[||a„-a llLP([o,i])I|,.gg._^(,)}J 

Jc 

since A*[^] = 1. By the result of ReiB [33], since s — l/vr > (1 -|- \/5)/4, 
we have that £n and f"^ are asymptotically equivalent. This means that 
we can approximate P" by Q" in variational norm, uniformly in o", up to 
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randomisation via a Markov kernel K that does not depend on a. More 
precisely, for any e > 0, we have 

P^[n-(^'^'-)/2||a2-a2||^,([o,i])>c'] 

- i^Q^[n"(«'P'-)/2||a2 - a^Lnm) > c']\ < e (4.28) 

as soon as n is large enough, and where we use the notation 

KQnidx)= I K{y,dx)Qnidy), x G C x 0', yen". 
Jo." 

Now, there exist c' > and (5' > such that for any estimator F m. £'^, 
by picking fi{da) as the least favourable prior in order to obtain lower 
bounds over Besov classes, we have 

/ ^Lida)q: [n"(^'P'-)/2||F - a^hpi[o,i]) > c'] > 5' > (4.29) 

for large enough n. This follows from classical analysis of the white Gaus- 
sian noise model, see for instance [23] in the framework of Besov spaces. 
Let us extend further (4.29) to the class of randomised decisions, that is 
estimators of the form F{^,»), where ^ is an auxiliary random variable, 
living on an auxiliary probability space with law i'{d^). Conditional on ^, 
an arbitrary randomised decision -F(^, •), can be viewed as an estimator, 
therefore, by (4. 29), we also have 

/i(d<T)Q^[n"(^'P'-)/2||F(^,.)-^'||Lp([o,i])>c'] >5' Kde)-a.s. 
c 

for large enough n. Integrating an applying Fubini, we derive 

Mda) /Krfe)Q^[n"^''^'"^/'iii^(e,.)-^'iiL.([o,i])>c'] >5'. 



ic 

Since v and F are arbitrary, it suffices then to identify the randomised 
decision -F(^, •) with the estimator a„ in E^ transported into a random 
decision in £'^ with the Markov kernel K appearing in (4.28). We thus 
obtain 

/ ^(da)KQ-K(^'P'-)/2||a2 - a2||i,([o,i]) > c'] > 5' (4.30) 

Jc 

for large enough n. Putting together (4.27), (4.28) and (4.30), we finally 
obtain 

„<.(.,P,.)/2E[p2 _ a2|k.([o,i])I|,.,^,^^(,)|] > 5' - e > 
for large enough n. The proof of Theorem 2.12 is complete. D 
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